50 research outputs found

    Comparison of normalisation methods for surface-enhanced laser desorption and ionisation (SELDI) time-of-flight (TOF) mass spectrometry data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Mass spectrometry for biological data analysis is an active field of research, providing an efficient way of high-throughput proteome screening. A popular variant of mass spectrometry is SELDI, which is often used to measure sample populations with the goal of developing (clinical) classifiers. Unfortunately, not only is the data resulting from such measurements quite noisy, variance between replicate measurements of the same sample can be high as well. Normalisation of spectra can greatly reduce the effect of this technical variance and further improve the quality and interpretability of the data. However, it is unclear which normalisation method yields the most informative result.</p> <p>Results</p> <p>In this paper, we describe the first systematic comparison of a wide range of normalisation methods, using two objectives that should be met by a good method. These objectives are minimisation of inter-spectra variance and maximisation of signal with respect to class separation. The former is assessed using an estimation of the coefficient of variation, the latter using the classification performance of three types of classifiers on real-world datasets representing two-class diagnostic problems. To obtain a maximally robust evaluation of a normalisation method, both objectives are evaluated over multiple datasets and multiple configurations of baseline correction and peak detection methods. Results are assessed for statistical significance and visualised to reveal the performance of each normalisation method, in particular with respect to using no normalisation. The normalisation methods described have been implemented in the freely available MASDA R-package.</p> <p>Conclusion</p> <p>In the general case, normalisation of mass spectra is beneficial to the quality of data. The majority of methods we compared performed significantly better than the case in which no normalisation was used. We have shown that normalisation methods that scale spectra by a factor based on the dispersion (e.g., standard deviation) of the data clearly outperform those where a factor based on the central location (e.g., mean) is used. Additional improvements in performance are obtained when these factors are estimated locally, using a sliding window within spectra, instead of globally, over full spectra. The underperforming category of methods using a globally estimated factor based on the central location of the data includes the method used by the majority of SELDI users.</p

    Biomarker discovery and redundancy reduction towards classification using a multi-factorial MALDI-TOF MS T2DM mouse model dataset

    Get PDF
    Diabetes like many diseases and biological processes is not mono-causal. On the one hand multifactorial studies with complex experimental design are required for its comprehensive analysis. On the other hand, the data from these studies often include a substantial amount of redundancy such as proteins that are typically represented by a multitude of peptides. Coping simultaneously with both complexities (experimental and technological) makes data analysis a challenge for Bioinformatics

    Harvest: an open-source tool for the validation and improvement of peptide identification metrics and fragmentation exploration

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein identification using mass spectrometry is an important tool in many areas of the life sciences, and in proteomics research in particular. Increasing the number of proteins correctly identified is dependent on the ability to include new knowledge about the mass spectrometry fragmentation process, into computational algorithms designed to separate true matches of peptides to unidentified mass spectra from spurious matches. This discrimination is achieved by computing a function of the various features of the potential match between the observed and theoretical spectra to give a numerical approximation of their similarity. It is these underlying "metrics" that determine the ability of a protein identification package to maximise correct identifications while limiting false discovery rates. There is currently no software available specifically for the simple implementation and analysis of arbitrary novel metrics for peptide matching and for the exploration of fragmentation patterns for a given dataset.</p> <p>Results</p> <p>We present Harvest: an open source software tool for analysing fragmentation patterns and assessing the power of a new piece of information about the MS/MS fragmentation process to more clearly differentiate between correct and random peptide assignments. We demonstrate this functionality using data metrics derived from the properties of individual datasets in a peptide identification context. Using Harvest, we demonstrate how the development of such metrics may improve correct peptide assignment confidence in the context of a high-throughput proteomics experiment and characterise properties of peptide fragmentation.</p> <p>Conclusions</p> <p>Harvest provides a simple framework in C++ for analysing and prototyping metrics for peptide matching, the core of the protein identification problem. It is not a protein identification package and answers a different research question to packages such as Sequest, Mascot, X!Tandem, and other protein identification packages. It does not aim to maximise the number of assigned peptides from a set of unknown spectra, but instead provides a method by which researchers can explore fragmentation properties and assess the power of novel metrics for peptide matching in the context of a given experiment. Metrics developed using Harvest may then become candidates for later integration into protein identification packages.</p

    A simpler method of preprocessing MALDI-TOF MS data for differential biomarker analysis: stem cell and melanoma cancer studies

    Get PDF
    <p>Abstract</p> <p>Introduction</p> <p>Raw spectral data from matrix-assisted laser desorption/ionisation time-of-flight (MALDI-TOF) with MS profiling techniques usually contains complex information not readily providing biological insight into disease. The association of identified features within raw data to a known peptide is extremely difficult. Data preprocessing to remove uncertainty characteristics in the data is normally required before performing any further analysis. This study proposes an alternative yet simple solution to preprocess raw MALDI-TOF-MS data for identification of candidate marker ions. Two in-house MALDI-TOF-MS data sets from two different sample sources (melanoma serum and cord blood plasma) are used in our study.</p> <p>Method</p> <p>Raw MS spectral profiles were preprocessed using the proposed approach to identify peak regions in the spectra. The preprocessed data was then analysed using bespoke machine learning algorithms for data reduction and ion selection. Using the selected ions, an ANN-based predictive model was constructed to examine the predictive power of these ions for classification.</p> <p>Results</p> <p>Our model identified 10 candidate marker ions for both data sets. These ion panels achieved over 90% classification accuracy on blind validation data. Receiver operating characteristics analysis was performed and the area under the curve for melanoma and cord blood classifiers was 0.991 and 0.986, respectively.</p> <p>Conclusion</p> <p>The results suggest that our data preprocessing technique removes unwanted characteristics of the raw data, while preserving the predictive components of the data. Ion identification analysis can be carried out using MALDI-TOF-MS data with the proposed data preprocessing technique coupled with bespoke algorithms for data reduction and ion selection.</p

    Rare Copy Number Variants Observed in Hereditary Breast Cancer Cases Disrupt Genes in Estrogen Signaling and TP53 Tumor Suppression Network

    Get PDF
    Breast cancer is the most common cancer in women in developed countries, and the contribution of genetic susceptibility to breast cancer development has been well-recognized. However, a great proportion of these hereditary predisposing factors still remain unidentified. To examine the contribution of rare copy number variants (CNVs) in breast cancer predisposition, high-resolution genome-wide scans were performed on genomic DNA of 103 BRCA1, BRCA2, and PALB2 mutation negative familial breast cancer cases and 128 geographically matched healthy female controls; for replication an independent cohort of 75 similarly mutation negative young breast cancer patients was used. All observed rare variants were confirmed by independent methods. The studied breast cancer cases showed a consistent increase in the frequency of rare CNVs when compared to controls. Furthermore, the biological networks of the disrupted genes differed between the two groups. In familial cases the observed mutations disrupted genes, which were significantly overrepresented in cellular functions related to maintenance of genomic integrity, including DNA double-strand break repair (P = 0.0211). Biological network analysis in the two independent breast cancer cohorts showed that the disrupted genes were closely related to estrogen signaling and TP53 centered tumor suppressor network. These results suggest that rare CNVs represent an alternative source of genetic variation influencing hereditary risk for breast cancer

    Reproducible Cancer Biomarker Discovery in SELDI-TOF MS Using Different Pre-Processing Algorithms

    Get PDF
    BACKGROUND: There has been much interest in differentiating diseased and normal samples using biomarkers derived from mass spectrometry (MS) studies. However, biomarker identification for specific diseases has been hindered by irreproducibility. Specifically, a peak profile extracted from a dataset for biomarker identification depends on a data pre-processing algorithm. Until now, no widely accepted agreement has been reached. RESULTS: In this paper, we investigated the consistency of biomarker identification using differentially expressed (DE) peaks from peak profiles produced by three widely used average spectrum-dependent pre-processing algorithms based on SELDI-TOF MS data for prostate and breast cancers. Our results revealed two important factors that affect the consistency of DE peak identification using different algorithms. One factor is that some DE peaks selected from one peak profile were not detected as peaks in other profiles, and the second factor is that the statistical power of identifying DE peaks in large peak profiles with many peaks may be low due to the large scale of the tests and small number of samples. Furthermore, we demonstrated that the DE peak detection power in large profiles could be improved by the stratified false discovery rate (FDR) control approach and that the reproducibility of DE peak detection could thereby be increased. CONCLUSIONS: Comparing and evaluating pre-processing algorithms in terms of reproducibility can elucidate the relationship among different algorithms and also help in selecting a pre-processing algorithm. The DE peaks selected from small peak profiles with few peaks for a dataset tend to be reproducibly detected in large peak profiles, which suggests that a suitable pre-processing algorithm should be able to produce peaks sufficient for identifying useful and reproducible biomarkers

    Small molecules, big targets: drug discovery faces the protein-protein interaction challenge.

    Get PDF
    Protein-protein interactions (PPIs) are of pivotal importance in the regulation of biological systems and are consequently implicated in the development of disease states. Recent work has begun to show that, with the right tools, certain classes of PPI can yield to the efforts of medicinal chemists to develop inhibitors, and the first PPI inhibitors have reached clinical development. In this Review, we describe the research leading to these breakthroughs and highlight the existence of groups of structurally related PPIs within the PPI target class. For each of these groups, we use examples of successful discovery efforts to illustrate the research strategies that have proved most useful.JS, DES and ARB thank the Wellcome Trust for funding.This is the author accepted manuscript. The final version is available from Nature Publishing Group via http://dx.doi.org/10.1038/nrd.2016.2

    The stability of multitrophic communities under habitat loss

    Get PDF
    Habitat loss (HL) affects species and their interactions, ultimately altering community dynamics. Yet, a challenge for community ecology is to understand how communities with multiple interaction types—hybrid communities—respond to HL prior to species extinctions. To this end, we develop a model to investigate the response of hybrid terrestrial communities to two types of HL: random and contiguous. Our model reveals changes in stability—temporal variability in population abundances—that are dependent on the spatial configuration of HL. Our findings highlight that habitat area determines the variability of populations via changes in the distribution of species interaction strengths. The divergent responses of communities to random and contiguous HL result from different constraints imposed on individuals’ mobility, impacting diversity and network structure in the random case, and destabilising communities by increasing interaction strength in the contiguous case. Analysis of intermediate HL suggests a gradual transition between the two extreme cases

    Randomized feasibility trial of the Scleroderma patient-centered intervention network hand exercise program (SPIN-HAND): Study protocol

    Get PDF
    BACKGOUND: Significant functional impairment of the hands is nearly universal in systemic sclerosis (SSc, scleroderma). Hand exercises may improve hand function, but developing, testing and disseminating rehabilitation interventions in SSc is challenging. The Scleroderma Patient-centered Intervention Network (SPIN) was established to address this issue and has developed an online hand exercise program to improve hand function for SSc patients (SPIN-HAND). The aim of the proposed feasibility trial is to evaluate the feasibility of conducting a full-scale randomized controlled trial (RCT) of the SPIN-HAND intervention. DESIGN AND METHODS: The SPIN-HAND feasibility trial will be conducted via the SPIN Cohort. The SPIN Cohort was developed as a framework for embedded pragmatic trials using the cohort multiple RCT design. In total, 40 English-speaking SPIN Cohort participants with at least mild hand function limitations (Cochin Hand Function Scale ≥3) and an indicated interest in using an online hand-exercise intervention will be randomized with a 1:1 ratio to be offered to use the SPIN-HAND program or usual care for 3 months. The primary aim is to evaluate the trial implementation processes, required resources and management, scientific aspects, and participant acceptability and usage of the SPIN-HAND program. DISCUSSION: The SPIN-HAND exercise program is a self-help tool that may improve hand function in patients with SSc. The SPIN-HAND feasibility trial will ensure that trial methodology is robust, feasible, and consistent with trial participant expectations. The results will guide adjustments that need to be implemented before undertaking a full-scale RCT of the SPIN-HAND program. TRIAL REGISTRATION: ClinicalTrials.gov IDENTIFIER: NCT03092024
    corecore